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ABSTRACT: Autosomal recessive spastic ataxia of 
Charlevoix-Saguenay (ARSACS) is a neurological dis- 
ease with mutations in SACS, encoding sacsin, a multidomain 
protein of 4,579 amino acids. The large size of SACS and 
its translated protein has hindered biochemical analysis of 
ARSACS, and how mutant sacsins lead to disease remains 
largely unknown. Three repeated sequences, called sacsin 
repeating region (SRR) supradomains, have been recognized, 
which contribute to sacsin chaperone-like activity. We found 
that the three SRRs are much larger (> 1,100 residues) than 
previously described, and organized in discrete subrepeats. We 
named the large repeated regions Sacsin Internal RePeaTs 
(SIRPTl, SIRPT2, and SIRPT3) and the subrepeats srl, sr2, 
sr3, and srX. Comparative analysis of vertebrate sacsins in 
combination with fine positional mapping of a set of human 
mutations revealed that srl, sr2, sr3, and srX are functional. 
Notably, the position of the pathogenic mutations in srl, 
sr2, sr3, and srX appeared to be related to the severity of 
the clinical phenotype, as assessed by defining a severity 
scoring system. Our results suggest that the relative position 
of mutations in subrepeats will variably influence sacsin 
dysfunction. The characterization of the specific role of each 
repeated region will help in developing a comprehensive and 
integrated pathophysiological model of function for sacsin. 
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Introduction 

Autosomal recessive spastic ataxia of Charlevoix-Saguenay 
(ARSACS; MIM #270550) is an early-onset neurological disease 
presenting a founder effect in the Quebec regions of Charlevoix 
and Saguenay-Lac-St-Jean where the estimated carrier frequency 
is 1/22 [Bouchard et al., 1978; 1998]. The major clinical features 
of ARSACS include early-onset ataxia, later occurrence of spastic 
paraparesis, and brisk tendon reflexes, and an axonal sensory-motor 
peripheral neuropathy, with some instances of mental retardation 
or cognitive decline. Brain magnetic resonance imaging shows a 
distinct, tigroid appearance of the pons [Van Damme et al., 2009] 
and invariably an atrophied cerebellar vermis. Hypermyelination 
of the retinal nerve fibers [Bouchard et al., 1978, 1998] has long 
been considered a cardinal feature in Quebecois French-Canadian 
patients, and is not so obvious in cases from elsewhere [Criscuolo 
et al., 2004; Hara et al., 2005] or even absent. Several aspects in- 
cluding early appearance of abnormal pontocerebellar and reti- 
nal fibers seen at brain neuroimaging speak for a neurodevelop- 
mental anomaly in ARSACS [GazuUa et al., 2012]. However, the 
progressive clinical course with involvement of the corticospinal 
tract and peripheral nerves in patients as well as studies in model 
mice questioned this hypothesis and suggested also the occurrence 
of a neurodegenerative process JGirard et al., 2012; Prodi et al., 
2012]. 

The gene responsible for ARSACS (SACS) JEngert et al, 2000] 
is located on chromosome 13ql2 and encodes sacsin, a protein 
whose canonic variant is described as a polypeptide of 4,579 amino 
acids (GenBank acc. no. NP_055178.3). The enormous size of 
the SACS gene and translated protein has considerably hindered 
biochemical studies to date, and currently much more is known 
about the genetics of ARSACS than about the function of sacsin 
in cells. Over the years, the number of ARSACS patients harbor- 
ing mutations in the SACS gene has rapidly increased. They are 
distributed worldwide and are not limited to few ethnicities, and 
virtually any type of mutations has been discovered [Anheim et al., 
2008]. 

How mutant sacsin leads to neurodegeneration remains largely 
unknown. Earlier work had indicated that sacsin might be involved 
in chaperone-mediated protein-folding activity [Engert et al., 2000] 
and play a role in regulating the Hsp70 chaperone machinery [Parfitt 
et al., 2009]. Recent biological and comparative genomic evidence 
suggested that sacsin is organized in a repetitive supradomain struc- 
ture of ~360 amino acids, named sacsin repeating region (SRR) 
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[Anderson et al., 2010], which in turn might drive its func- 
tion. Biochemical characterization demonstrated that such repet- 
itive supradomain possesses ATPase activity, v\rhich appears to be 
a requirement for sacsin function, as a disease causing mutation 
leads to an alternate conformation incapable of hydrolyzing ATP 
[Anderson et al., 2010]. As well, this structure has been shown to 
enhance the refolding efficiency of a client protein, maintain it in 
soluble folding-competent states, and cooperate with members of 
the Hsp70 chaperone family to increase the yield of correctly folded 
client [Anderson et al., 2011]. Even more recently, sacsin has been 
shown to operate as a dimer and bind GTP at its C-terminus [Ko- 
zlov et al., 2011], with mutations in this region also resulting in 
loss of function. In addition, sacsin has been indicated as a po- 
tential substrate of the ubiquitin ligase Ube3A protein, which is 
responsible for Angelman syndrome (MIM #105830), a neurode- 
velopmental disorder with a motor component that shares same 
clinical aspects with ARSACS [Greer et al., 2010] . Such observations 
onto the function{s) of sacsin mainly arise from preliminary analy- 
sis on single putative domains that have been recognized along the 
sacsin sequence and are presently considered hallmarks of its struc- 
ture. Finally, the generation of a sacsin knockout mouse is opening 
intriguing perspectives in the exploration of the pathophysiologi- 
cal basis of ARSACS, having shown that sacsin localizes to mito- 
chondria and participates in regulation of mitochondrial dynamics 
via its interaction with dynamin-related protein 1 [Girard et al., 
2012]. 

In the present work, we aimed at expanding our knowledge on 
the structure of sacsin. Three very large (>1,100 amino acids) re- 
peated regions were detected along the sacsin amino-acid sequence, 
each characterized by the occurrence of at least three subrepeats. A 
fourth subrepeat occurred in the first and third repeated region only. 
Such organization in domains is common to sacsin in all vertebrates 
including mammals, birds, reptiles, and fish. The comparative anal- 
ysis of vertebrate sacsins architecture in combination with the fine 
positional mapping of a large set of disease causing mutations in 
human SACS well supported the concept of the functional nature 
of these novel domains. Furthermore, the location of a small se- 
lection of genetic variants detected in ARSACS was put in relation 
with the phenotype adopting a Spastic Ataxia (SPAX) rating system 
of clinical severity. Scoring mutations suggested original structure- 
function paradigms for sacsin, with hints on the relative relevance 
of novel and knovra domains in the activity of the protein. 



Materials and Methods 

Human SACS Gene, mRNA, and Protein Sequences and 
SNPs 

The reference sequences for human {Homo sapiens) SACS gene 
(GenBank acc. no. NC_000013.10), mRNA (GenBank acc. no. 
NM_014363.4), and protein (GenBank acc. no. NP_055I78.3) were 
as reported in Entrez Gene at the National Center for Biotech- 
nology Information (NCBI) (http://www.ncbi.nlm.nih.gov/gene). 
The human SACS gene SNPs mapped in this study (mis- 
sense and nonsense mutations only) were from dbSNP at NCBI 
(http://www.ncbi.nlm.nih.gov/snp) and from literature [Engert 
et al., 2000; Guernsey et al., 2010; Vermeer et al, 2009]. Throughout 
the manuscript, we systematically used names for both DNA and 
protein variations whenever appropriate, and adopted a mutation 
niunbering system based on cDNA sequence as suggested by the 
internationally agreed mutation nomenclature (vww.hgvs.org/). 



Pattern and Profile Searches 

Putative domains were defined using the pattern and pro- 
file searches tools included in the ExPASy Proteomics Server 
(http://vww.expasy.org/resources); in particular, the Simple Mod- 
ular Architecture Research Tool 6 (SMART 6) (http://smart.embl- 
heidelberg.de/) [Letunic et al., 2009] and/or the ScanProsite tool 
(http://prosite.expasy.org/) [de Castro et al., 2006]. Internal repeats 
were detected by using the Prospero program, as included in SMART 
6. Default parameters were always used for analyses and only do- 
mains above threshold were represented. SIM, an alignment tool for 
analysis of local similarity in nucleotide and amino-acid sequences 
(http://web.expasy.org/sim/) [Huang and Miller, 1991], served to 
generate pairwise alignments of sacsin versus internal repeats 
using default parameters. The computed alignments were viewed 
using the graphical viewer program LALNVIEW (http://pbil.univ- 
lyonl.fr/software/lalnview.html) [Duret et al., 1996]. Further de- 
tails on the single computational tools and parameters used for 
analyses are reported in the legends to the figures as appropri- 
ate. Domains were drawn using the MyDomains image creator 
(http://prosite.expasy.org/mydomains). 



Protein Sequence Alignments and Phylogenesis 

On the basis of the genomic analysis detailed in the Support- 
ing Information, the deduced protein sequences of orangutan, 
dog, horse, mouse, rat, chicken, zebra finch, anole lizard, fugu, 
tetraodon, stickleback, medaka, and zebrafish were obtained and 
used for alignments. Pairwise alignments of human versus the 
other vertebrate sacsin proteins were obtained by using SIM, as 
detailed above. Multiple sequence alignment of vertebrate sacsin 
proteins was obtained by using ClustalW2 using default parameters 
(http://vww.ebi.ac.uk/Tools/clustalw2/index.html) [Larkin et al., 
2007]. The phylogenetic reconstruction was generated by the 
neighbor-joining method [SaitouandNei, 1987], as implemented in 
the Molecular Evolutionary Genetics Analysis 4 (MEGA4) software 
(http://www.megasoftware.net/) [Tamura et al., 2007]. 



Definition of a SPAX Scoring System (SPAX score) in 
ARSACS 

Definition of a clinical score in ARSACS is lagging behind, al- 
though reliable and valid composite scores have been developed 
for the highly similar inherited ataxias [Trouillas et al., 1997] and 
the hereditary spastic paraplegias [Schiile et al., 2006]. To define a 
posteriori a measure of disease severity in ARSACS and to correlate 
scores with type and location of mutations in sacsin, we put together 
a measure of severity in SPAX score that takes into account the "core 
features" of ARSACS, including cerebellar ataxia, spastic paraplegia, 
and peripheral neuropathy. We are aware that SPAX scores are only 
an initial attempt to score disease severity, especially in the absence 
of functional tests, but the rating system has an intrinsic value in 
that it sums the gravity of the individual hallmarks of the disease 
through the use of validated scales. In particular, we used the pa- 
rameters developed in the Scale for the Assessment and Rating of 
Ataxia [Schmitz-Hiibsch et al., 2006] for cerebellar ataxia, the Spastic 
Paraplegia Rating Scale for motor symptoms and spasticity [Schiile 
et al., 2006], and the modified version of the Charcot-Marie-Tooth 
neuropathy score [Murphy et al., 2011] for peripheral neuropathy. 
In addition, cognitive impairment (0, absent to 3, if severe) and 
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ocular findings (from 0, normal to 4, maximal abnormality) were 
assessed. When visual abnormalities were detected only at optical 
coherence tomography, a unit was subtracted from the subscore. 
The several items of the scales were reviewed by two independent 
investigators blind to the genotype, duplicated items removed, data 
on single items averaged, and then corrected for disease duration 
whenever possible (or for averaged disease duration in a family). A 
grade of fimctional severity in ARSACS varying from 0 to 2 (maxi- 
mal severity) was then calculated. 

Results 

Identification of Novel Domains in Human SACS 

Along with the original description of human SACS [Engert et al., 
2000], it was suggested that repeating regions, two of which con- 
taining the putative ATP-binding domain of Hsp90, might have 
occurred in the sacsin protein. At that time, human SACS was con- 
sidered to consist of a single gigantic exon spanning 12,794 bp 
[Engert et al., 2000]. With the identification of nine (one noncod- 
ing and eight coding) additional exons upstream of this gigantic 
exon, the presence of conserved amino-acid sequences occurring 
in triplicate along the encoded protein started to be foreseen, and 
very recently the formal description of the SRR supradomain has 
been proposed [Anderson et al., 2010]. In this study, a systematic 
analysis of domains along the human sacsin amino-acid sequence 
was performed. In particular, besides the well-known ubiquitin-like 
(ubiquitin; PFAM acc. no. PF00240), DnaJ (DnaJ molecular chap- 
erone homology domain; SMART acc. no. SM00271), and HEPN 
(Higher Eukaryotes and Prokaryotes Nucleotide-binding domain; 
SMART acc. no. SM00748) domains (see Fig. IB), two large Pros- 
pero repeats, corresponding to the 61-1,371 and 2,473-3,893 protein 
fragments of the human sacsin, were detected along the polypeptide 
chain (Fig. lA). Interestingly, both repeats shared similarity with a 
third homologous region in between them (along the 1,372-2,472 
protein fragment), as detected by SIM analysis of sacsin protein 
versus each Prospero repeat (Fig. lA). Similar results were also ob- 
tained by using the HHRepID program [Biegert and Soding, 2008] 
(data not shown). We named these three large homologous repeat- 
ing regions Sacsin Internal RePeaTs (namely SIRPTl, SIRPT2, and 
SIRPT3; see Fig. 1 ). In spite of their low overall similarity (e.g., 16%- 
18% in human sacsin, with SIRFTl vs. SIRPT2: 17%, SIRPTl vs. 
SIRPT3: 16%, and SIRPT2vs. SIRPT3: 18%), each SIRPT displayed, 
based on the degree of local similarity, at least three subrepeats 
(Fig. IB) that were distanced by regions of extremely low similarity. 
We named these: subrepeat 1 (sr2), 2 {sr2), and 3 (sr3) (for position 
of the subrepeats along the protein, see Fig. IB). Noteworthy, each 
srl contained a well-recognizable HATPase_c (Histidine kinase-like 
ATPases; SMART acc. no. SM00387) domain, which is adopted by 
the ATP-binding and catalytic domain of (among others) the mem- 
bers of the vast GHKL class of proteins (so-called after the found- 
ing members of the class: DNA Gyrase, Hsp90, bacterial histidine 
Kinases and MutL) [Dutta and Inouye, 2000] (Fig. IB). Also, within 
the S/J^PT architecture, srls and sr2s virtually corresponded to An- 
derson et al. SRR supradomain [Anderson et al., 2010] (Fig. IB). 
On the other side, sr3 revealed no obvious relationships to any of 
the so far acknowledged domains included in databases. Besides the 
srl, sr2, and sr3 domains described above, another repeated region 
could be identified in SIRPTl and SIRPT3 in the area of very limited 
similarity. In fact, a long repeated region in SIRPTl shared similar- 
ity with a homologous region in SIRPT3. We named this region srX 
(Fig. IB). The srX domain had no obvious coimterpart in the signif- 



icantly shorter SIRPT2 (see Fig. lA and B). Also, srXhad no obvious 
similarity to any of the so far acknowledged domains in databases. 
Interestingly, in SIRPT3, the amino-acid sequence between srX and 
sr3 corresponded to a sacsin region previously reported to share 
limited homology with the Xeroderma Pigmentosum complemen- 
tation group C binding (XPCB) domain of hHR23A [Kamionka 
and Feigon, 2004] and recently implicated in interactions with the 
ubiquitin ligase Ube3A [Greer et al., 2010] (Fig. IB). 



Conservation of Sacsin Structural Organization among 
Vertebrates 

Comparative analysis of homologous proteins across phyloge- 

netically distant species represents a powerful method for detecting 
conserved structural elements in proteins. Comparison of human 
sequences with sequences of other mammals, avians, reptiles, and 
teleost fish is valuable; in particular, teleosts offer maximal strin- 
gency for sequence comparisons among vertebrates. On this con- 
ceptual basis, we compared amino-acid sequence of sacsins from 
human with fish, having verified that: ( 1 ) genes encoding sacsin pro- 
teins are found in all vertebrate genomes sequenced so far, (2) sacsin 
proteins may have similar functional role(s) in aU vertebrates, as sup- 
ported by the evidence of similar expression patterns in mammals 
[Engert et al., 2000; Parfitt et al., 2009] and fish (such as zebrafish; 
see Supp. Fig. SI and Supp. Table SI). In particular, as a result of 
a comprehensive gene analysis among vertebrates, sacsin proteins 
were deduced from human and other 13 vertebrate species, namely 
five mammals (orangutan, mouse, rat, horse, and dog), two birds 
(chicken and zebra finch), one reptile (anole lizard), and five fish 
(zebrafish, tetraodon, fugu, stickleback, and medaka). Then, the 
protein sequences were compared (for details, see Supp. Fig. S2), 
and the phylogenetic relationships among them are summarized in 
Figure 2A. With respect to the human protein, the other mam- 
malian sacsins exhibited an overall degree of similarity (amino-acid 
identity) that varied from ~99% to ~93%, whereas the bird sacsins 
revealed an overall similarity of ~84% and the reptilian sacsin of 
83% (for details, see Supp. Table S2). Fish proteins shared an overall 
degree of similarity with human sacsin that varied from ~70% to 
~68% (Supp. Table S2). As expected, degrees of similarity locally 
varied along the protein sequence. The local degree of similarity is 
depicted in Figure 2B. In spite of these differences, all vertebrate sac- 
sins conserved the same structural architecture as the human sacsin 
(see Supp. Fig. S3). 



Comparative Analysis of Vertebrate Protein Architecture 
and Positional Mapping of Human S/ICS Mutations Reveal 
the Functional Nature of the Sacsin Repeated Domains 

On the basis of our SZ-RPT-centered protein architecture and 
sequence similarity data, the following intra/intersequence align- 
ment strategy was played out to identify and typify unique con- 
served elements in the repeated domains of the vertebrate sac- 
sin proteins. Namely, the amino-acid sequences corresponding to 
SIRPTl -srl, SIRPT2-srl, and SIRPT3-srl from the human and other 
vertebrate sacsins were ahgned against each other (see Supp. Fig. S4) . 
The same procedure was applied to the sequences corresponding 
to SIRPTl-sr2, SIRPT2-sr2, and SIRPT3-sr2 (see Supp. Fig. S5), 
to SIRPTl -sr3, SIRPT2-sr3, and SIRPT3-sr3 (see Supp. Fig. S6), 
and to SIRPTl -srX and SIRPT3-srX (see Supp. Fig. S7) of the hu- 
man and other vertebrate sacsins. Notably, in spite of the highly 
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Figure 1. Identification of domains in human sacsin. A: (upper panel) Internal repeats above threshold were detected by Prospero. (Lower 
panel) Pairwise sequence alignments of human sacsin versus the first and the second Prospero repeat (corresponding to amino acids 61-1,371 
and 2,473-3,893, respectively) were generated by SIM. The computed alignments were visualized by LALNVIEW. Percent identity is reported 
in the figure. Different colors indicate different degrees of similarity (amino-acid identity) along the aligned sequences (black: 100%; white: 
nothing detected). B: Sacsin Internal RePeaTs [SIRPTs] and relevant subrepeats 1 {srI], 2 {sr2\, 3 {sr3l, and X {srX) within SIRPTs are indicated, 
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sacsin repeating region (SRR) supradomains defined by Anderson et al. (2010, 201 1 ) are indicated as SRR1 (amino acids 107-505), SRR2 (1,471-1,921 ) 
and SRR3 (2,539-2,922), with each supradomain composed of an srl, an sr2, and an sr1-sr2 connecting (linker) region. Please note that srI starts 
23-27 amino acids upstream the C-terminus of the SRR domain (with SRR virtually starting with the HATPase_c domain) and sr2ends 38-52 amino 
acids downstream the N-terminus of the SRR domain. Putative domains above threshold as detected by using SMART 6 and/or ScanProsite are 
also indicated: ubiquitin-like (ubiquitin; PFAM acc. no. PF00240), HATPase_c (histidine kinase-like ATPases; SMART acc. no. SM00387), DnaJ (DnaJ 
molecular chaperone homology domain; SMART acc. no. SM00271), HEPN (higher eukaryotes and prokaryotes nucleotide-binding domain; SMART 
acc. no. SM00748). For sake of clarity, the putative sacsin XPCB domain is also shown. Domains were drawn using the MyDomains image creator. 



selective alignment procedure, a number of amino-acid residues 
still kept appearing conserved in the same position of mate repeated 
domains. 

If the sacsin repeated domains are functional, the amino acids 
that are found in these conserved positions should then be con- 
sidered critical for sacsin function. Accordingly, in such repeated 
and conserved positions, one should expect to find more mis- 
sense mutations associated with disease (missense pathogenic) than 
missense mutations not associated with disease (missense non- 
pathogenic) and/or nonsense (protein truncating) pathogenic mu- 
tations [Miller and Kumar, 2001; Miller et al., 2003]. To test this 
hypothesis, we collected missense (pathogenic and nonpathogenic) 
and nonsense mutations that have been reported to occur in human 



sacsin. In particular. Table 1 represents the recent update (Jan- 
uary 2012) of all the acknowledged missense and nonsense mu- 
tations that are clearly pathogenetic in ARSACS patients of dif- 
ferent geographic origins (Supp. Appendix I lists pathogenic mis- 
sense and nonsense mutations identified later than January 2012 
and frameshift mutations not used in this study). Furthermore, 
Supp. Table S3 represents the list of aU the missense mutations that 
have been described as SNPs in humans up to January 2012; for 
the most part, these mutations were recognized as undoubtedly 
nonpathogenic and were used for analysis (for details, see legend 
to Supp. Table S3) (Supp. Appendix II and Supp. Appendix III 
report a recent update of SNPs from dbSNP and NHLBI Exome 
Sequencing Project, respectively). Detailed positional information 
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Figure 3. Relative amount of conserved versus nonconserved missense mutations in SIRPTsrl, sr2, sr3, and srXdomains. When mapped on our 
multiple alignments (see Supp. Figs. S4-S7), in each domain "conserved" (i.e., identical, conserved and semi-conserved, as assessed by ClustalW) 
pathogenic missense mutations were invariably over-represented with respect to missense nonpathogenic mutations (for details, see Supp. Table 
S5). Unclear mutations (i.e., variants not yet clearly associated with disease; for details, see Supp. Tables S3-S5) were omitted from the analysis. 



and distribution of the mutations in the various domains along 
the human protein are summarized in Supp. Figure S2 and Supp. 
Table S4. All the mutations falling in positions within the srl, sr2, sr3, 
and srX domains have been represented in Supp. Figures S4-S7. The 
relative amounts of missense pathogenic mutations, on one hand, 
and missense nonpathogenic mutations, on the other — expressed 
as percent of conserved vs. non-conserved mutations — are reported 
in Figure 3. As expected, with respect to the group of missense 
nonpathogenic mutations, missense pathogenic mutations were in- 
variably over-represented in conserved positions in srl, srl, sr3, and 
srX (for details, see Supp. Table S5), thus suggesting that the four 
repeated domains of the SffiPT regions identified in this work (that 
include, with srl and sr2, and go beyond, with sr3 and srX, the SRR 
design; see Fig. IB) [Anderson et al., 2010] do play a functional role 
in the sacsin protein. 

The functional nature of srl, srl, sr3, and srX is also sustained 
by the observation that in human sacsin pathogenic missense mu- 
tations were found to be over-represented in these domains with 
respect to the regions between domains (interdomains), as quahta- 



tively assessed by calculating the likelihood of occurrence of missense 
pathogenic mutations, that is, the ratio of the percentage mutations 
on a given region and the percentage of amino acids of the protein 
on the same region (for details, see Table 2). In particular, the cal- 
culated likelihood was 1.97, 1.42, 0.79, and 0.51 for srl, srl, srX, and 
sr3 domains, respectively, with respect to 0.35 for the interdomains. 



Functional Relevance of srl, sr2, srX, and srJin Sacsin 
Protein Based on Composite SPAX Scores Analysis 

To investigate on the putative functional relevance of the various 
repeated domains that result from the proposed new sacsin archi- 
tecture, we analyzed the clinical phenotype in patients selected for 
having, in a given domain (i.e., srl, srl, srX, and sr3), a missense 
pathogenic mutation (1) in homozygosis or (2) in heterozygosis 
with a frameshift mutation, a stop mutation, or a macrodeletion 
(see Table3). It is reasonable to think that in an autosomal recessive 
disorder, such as ARSACS, frameshift mutations, stop mutations. 
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Table 2. Percentage of the Amino Acids in a Given Region (% protein). Percentage of tlie Mutations in tlie Same Region (% mutation) 
and Likeliliood of a Mutation Occurring in the Region (% Mutation/% Protein), Calculated as the Ratio of the Percentage Mutations on a 
Given Region and the Percentage of Amino Acids of the Protein on the Region 



whole sacsin Ubiquitin-like srl sr2 sr3 srX XPCB DnaJ HEPN Interdomains 



Region 



Protein (fragment) length (aa) 4,579 72 817 

% Protein 100 1.57 17.84 

Misscnsc mutations 37 0 13 

% MuUllion 100 0 35.13 

Liliclihoiid ( "n niuUUion/*'o protein ) 1.00 0.00 1.^)7 



aa, amino acids. 

and macrodeletions can abolish sacsin function, although other 

mechanisms, such as dominant-negative effects, cannot totally be 
excluded until functional tests are performed. Under these condi- 
tions, we expect that differences in the clinical phenotypes observed 
in patients (1) are due to the effect(s) of the missense mutation 
on protein function and (2) provide (at least in part) information 
on the functional relevance of the protein domain where the mis- 
sense mutation acts. In fact, although the nature of the substituted 
amino acid may contribute per se to the severity of the phenotype, 
it cannot be ignored that the effect of an amino-acid substitution 
depends on the protein domain where the substitution occurs. As 
a means to evaluate the pleomorphic clinical phenotype of AR- 
SACS, we defined a composite SPAX score, which takes into account 
the major core features (cognitive, cerebellar, spasticity, peripheral 
nerve, and retinopathy) that are part of the disease. This scoring 
system is largely based on validated rating scales for spasticity, pe- 
ripheral neuropathy, and cerebellar function, corrected for disease 
duration and used to evaluate the severity of the clinical phenotype 
(see Table 3). 

As a way to define the maximal severity of the disease, in our anal- 
ysis, we initially calculated SPAX scores from patients in which both 
alleles were predicted to generate truncated sacsins (due to presence 
on both alleles of either a frameshift mutation or a stop mutation 
or a macrodeletion; for a description of various combinations of 
alleles, please see Supp. Table S6). As it results from the analysis of 
Figure 4, these patients formed a homogeneous group that ranked 
at the highest SPAX scores among those calculated in this study 
(for comparison, see also Table 3), with values varying from 1.48 to 
1.84. Conversely, when SPAX scores were calculated from patients 
carrying a pathogenic missense mutation in srl, srl, srX, or sr3 (in 
homozygosis or heterozygosis with a frameshift mutation, a stop 
mutation or a macrodeletion, as described above), it was evident 
that the severity of the clinical phenotype largely varied (see Fig. 4 
and Table 3) from values similar to those observed in patients car- 
rying (a) truncated protein(s), for example, 1.69 for the c.l420C>T 
(p.R474C)/c.5719C>T (p.R1907X), which suggests nearly complete 
abolition of protein function, to significantly lower values, for exam- 
ple, 0.69 for c.3932T>A (p.M1311K)/c.3932T>A (M1311K), which 
suggests subsistence of partial or residual protein activity. Overall, 
the presence of the missense pathogenic mutations in srl, srl, srX, 
or sr3 established on average a set of phenotypes (i.e., SPAX scores) 
significantly less severe (i.e., lower) than those observed for muta- 
tions that generated truncated proteins (ANOVA; P < 0.0001). We 
assumed that such a behavior correlated to the relevance that the 
domain in which the mutation falls had for sacsin activity. In par- 
ticular, a trend to lower SPAX scores passing from srl and srl to 
srXand sr3 could be observed, suggesting (1) that alterations in srX 
and sr3 do cause less harmful, although measurable, effects on the 
function of the protein with respect to srl and srl, and thus (2) that 
srX and sr3 play a "minority" role in the operational mechanism of 
the protein with respect to srl and srl. 



436 481 1098 76 60 117 1,422 

9.52 10.51 23.98 1.66 1.31 2.56 31.05 

5 2 7 0 3 3 4 

13.51 5.41 18.92 0 8.11 8.11 10.81 

1.12 (1.51 U.79 0.00 6.19 3.17 0.35 



Discussion 

In this study, a systematic inspection of vertebrate sacsins has 
been carried out to identify repeated domains along the protein. 
By using a combination of standard databank consulting tools and 
bioinformatics methods, three large (>1,100 amino acids) repeated 
regions have been identified. Such internal repeats, named SIRPTl, 
SIRPTl, and SIRPT3, cover ~84% of the protein sequence, and 
each contains three subrepeats, named srl, srl, and sr3, with srl and 
srl falling into Anderson et al. SRR supradomain [Anderson et al., 
2010] . In addition, a fourth subrepeat, named srX, occurs in the first 
and the third internal repeat only, in a region between srl and sr3. 
Our S/i?PT-based architectural structure is invariably conserved in 
all vertebrate sacsins. This is not unexpected, as vertebrate sacsins 
share a high degree of similarity at both global and local level (this 
study), and most probably exert similar functional roles, a notion 
also supported by the observation that similar expression patterns 
can be found in both mammals [Engert et al., 2000; Parfitt et al., 
2009] and fish (this study). 

AH the different subrepeats identified within the SIRFT archi- 
tecture most likely represent regions involved in sacsin function. 
To answer this question, we have developed a strategy that com- 
bines very stringent alignments of the vertebrate sacsin domains 
with positional mapping of the human SACS mutations (for de- 
tails, see Results). As a matter of fact, at least two pieces of evidence 
come out from our analyses indicating that the different subre- 
peats identified do represent functional regions. First, in srl, srl, 
srX, and sr3, missense pathogenic mutations are invariably over- 
represented in conserved positions with respect to missense non- 
pathogenic mutations. Second, missense pathogenic mutations are 
over-represented in srl, srl, sr3, and srX with respect to the re- 
gions between domains [Miller et al., 2003], this scheme being fully 
applicable also to the well-known Dnal and HEPN domains. All 
together, these findings indicate that there is a strong tendency 
in the sacsin protein to gather the missense mutations associated 
with disease within the newly identified or the already known 
domains. 

Sacsin is considered to operate in a chaperone-like manner, but 
very limited information is available on its activity, mainly due to 
the technical difficulties of managing with such an unusually long 
protein by means of standard biochemical, cellular, or molecular 
biology assays [Anderson et al., 2010; Kozlov et al., 2011; Parfitt 
et al., 2009). Under such circumstances, achieving information on 
the fiinctional role(s) of our srl, srl, srX, and sr3 domains repre- 
sents a difficult task. In the effort to obtain new hints on the impact 
of the newly identified domains in the activity of the protein, we 
have developed a procedure that allows evaluation of the functional 
relevance of the domains by measuring the severity of the clinical 
phenotype, quantified in terms of SPAX score, in patients selected 
for carrying missense pathogenic mutations in srl, srl, srX, and sr3 
in homozygosity or heterozygosity with a null allele (for details, 
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2.0-1 



1.5- 



1.0- 



0.5- 
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□oo 
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Truncation sr1 sr2 srx sr3 DnaJ HEPN 

Figure 4. Composite SPAX (Spastic Ataxia) scores versus sacsin repeated domains. This scatter dot plot shows the assortment of SPAX scores 
from patients carrying a pathogenic missense mutation in srl, sr2, srX, or sr3'm homozygosis or heterozygosis with a frameshift mutation, a stop 
mutation or a macrodeletion (for details, see Table 3). SPAX scores from patients in which both alleles were predicted to generate truncated 
proteins (forthe presence on both alleles of either a frameshift mutation or a stop mutation or a macrodeletion) were also represented (for details, 
see Supp. Table S6). For comparison, SPAX scores from patients carrying a pathogenic missense mutation in DnaJ or HEPN in homozygosis or 
heterozygosis with a frameshift mutation, a stop mutation or a macrodeletion were also drawn (for details, see Table 3). Within each category, the 
horizontal line indicates the calculated mean value. 



see Results). In spite of the limits of this experimental approach, 
essentially because of the so far limited number of patients com- 
posing each group, from our analysis it is evident that: (1) patients 
carrying missense pathogenic mutations in homozygosity or het- 
erozygosity with a null allele exhibit significantly milder phenotypes, 
that is, lower SPAX scores, than patients carrying a null mutation 
on each allele (a condition that is predicted to fuUy abolish protein 
function; for details, see Results); (2) mean SPAX scores decrease 
passing from srl to srS, with srl (1.06) = sr2 (1.10) > srJf (0.83) > sr3 
(0.71), which suggests that alterations in srX and sr3 are less dam- 
aging in patients than those in srl and sr2, and thus that srX and 
sr3 play a less determinant role in the operational mechanism of 
the protein with respect to srl and sr2. Nonetheless, we recognize 
that our data should be weighted cautiously and that additional 
determinants of severity might come out from future functional 
tests. In this context, it has to be underlined that our simplified 
approach cannot take into account in a simple way the yet possible 
contribution of the nature of the amino-acid substitution on the 
severity of the phenotype. Thus, we considered that the effect of 
an amino-acid substitution depends on the protein domain where 
the substitution falls and comes to operate rather independently of 
the nature of the mutation. That this may hold true comes from 
the observation that the same type of amino-acid change (see, e.g., 
R-to-C, that occurs thrice in srl and once in sr2) may result in either 
high (in sr2) or medium-to-low (in srl) SPAX scores (for details, see 
Table 3). 



Our results extend and refine the current knowledge on the 
organization of some sacsin domains. In particular, the srl and 
sr2 domains identified in this work substantially form the SRR 
supradomain recently defined by others [Anderson et al., 2010]. 
This supradomain is composed of an N-terminal portion (~160 
residues), which is homologous to the HATPase_c domain of Hsp90, 
and a C-terminal portion (~200 residues), which consists of a novel 
sequence invariably connected to the HATPase_c domain [Ander- 
son et al., 2010]. Our bioinformatics approach divides this SRR 
supradomain in two well-defined repeated domains, that is, srl and 
sr2, which are separated by an evident nonrepeated linker segment. 
This organization is coherent with a system that works as an Hsp90- 
like protein. In fact, in Hsp90-type chaperones, the ATP binding 
domain is connected to the middle domain via a divergent linker 
region. In particular, in our sacsin organization, srl represents the 
ATP binding domain and sr2 the middle domain. Notably, in Hsp90 
the middle domain invariably contains an arginine residue accept- 
ing phosphate after ATP hydrolysis [Pearl and Prodromou, 2006]. 
This phosphoacceptor arginine, already observed by Anderson et al. 
(2010) as invariably conserved in each C-terminal region of their 
SRR supradomains, does occur in each sr2 domain. Interestingly, 
our study clearly demonstrates the crucial role of this arginine in 
the operational mechanism of sacsin. In fact, a mutation occurring 
on one of such conserved arginines, namely c.l420C>T (p.R474C) 
in SIRPTl-sr2, associates to one of the highest SPAX scores (1.69) 
found in this survey. 



Table 3. Continued 

Individual Items to Score Disease Severity 



Score 


Onset 


Cognitive 


Cerebellar^ 


Spasticity'' 


Peripheral neuropathy*^ 


Retinal 


0 


Adult 


Absent 


Absent 


Absent 


Absent 


Absent 


1 


Juvenile 


Mild decline 


Mild 


Mild 


Mild 


No functional impairment but aware of worsened acuities 


2 


Teen 


IQ lower than peers 


Moderate 


Moderate 


Moderate 


Reduced night vision 


3 


Early- onset 


Marked mental retardation 


Severe 


Severe 


Severe 


Abnormal fundoscopy or ERG 



IQ, intelligence quotient; ERG, electroretinogram. 

Note. Total scoring is corrected for time of disease (yrs) under the assumption that disease severity worsen with disease duration, and it is expressed as percent. 
^On the basis of SARA: Scale for the Assessment and Rating of Ataxia and lACRS (Inherited Ataxia Clinical Rating Scale). 
''On the basis of SPRS: Spastic Paraplegia Rating Scale. 

'^On the basis of CMT (Charcot-Marie-Tooth) neuropathy score (second version). 
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In this article, we report for the first time the occurrence of 
two novel repeated domains, namely srX and sr3, downstream the 
Hsp90-type regions discussed above. Such domains share no sim- 
ilarity to any domains reported so far in databanks, and no obvi- 
ous role can be assigned to them. However, in the context of an 
Hsp90-like scheme of function, srX and/or sr3, located near the 
srllsrl "biochemical clamp" that allows ATP binding and hydrol- 
ysis, may participate (via dimerization, client binding, cochaper- 
one interaction, regulation, etc.) to sacsin chaperone activity. In 
this respect, there has been recent demonstration that a large sac- 
sin region (RegA), virtually corresponding to our SIRPTl, do ex- 
hibit a chaperone-like activity that can be detected in vitro by 
standard biochemical approaches [Anderson et al., 2011]. Such 
protein module is composed of the Hsp90-like region and of a 
large undefined downstream region. However, our study identi- 
fies srX and sr3 as functional elements in that large undefined 
region and likely involved in the chaperone activity of the whole 
module. 

In conclusion, we used a functional comparative genomics ap- 
proach that combines bioinformatics sequence examination tools 
to mapping and phenotypical analysis of human mutations, to pro- 
vide novel information on the organization in repeated domains 
of sacsin. In particular, our results establish that large portions of 
the protein can be arranged in a few and well-defined repeated do- 
mains. The demonstration of the functional nature of srl, sr2, srX, 
and sr3 suggests that these regions contribute to the activity of the 
protein. Further studies are needed to define the specific role(s) of 
such domains, in the perspective of developing a comprehensive and 
integrated model of function for sacsin in the context of cell patho- 
physiology. In a larger perspective, our approach that combines 
comparative analysis of vertebrate protein sequences/architecture, 
positional mapping of human mutations, and severity of clinical 
phenotype can be tentatively applied in the biomedical field to shed 
light on the functional nature of other proteins associated to disease 
but of yet unknown function. 
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